19 research outputs found
Private Federated Learning with Autotuned Compression
We propose new techniques for reducing communication in private federated
learning without the need for setting or tuning compression rates. Our
on-the-fly methods automatically adjust the compression rate based on the error
induced during training, while maintaining provable privacy guarantees through
the use of secure aggregation and differential privacy. Our techniques are
provably instance-optimal for mean estimation, meaning that they can adapt to
the ``hardness of the problem" with minimal interactivity. We demonstrate the
effectiveness of our approach on real-world datasets by achieving favorable
compression rates without the need for tuning.Comment: Accepted to ICML 202
Privacy Amplification for Matrix Mechanisms
Privacy amplification exploits randomness in data selection to provide
tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's
success in machine learning, but, is not readily applicable to the newer
state-of-the-art algorithms. This is because these algorithms, known as
DP-FTRL, use the matrix mechanism to add correlated noise instead of
independent noise as in DP-SGD.
In this paper, we propose "MMCC", the first algorithm to analyze privacy
amplification via sampling for any generic matrix mechanism. MMCC is nearly
tight in that it approaches a lower bound as . To analyze
correlated outputs in MMCC, we prove that they can be analyzed as if they were
independent, by conditioning them on prior outputs. Our "conditional
composition theorem" has broad utility: we use it to show that the noise added
to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with
amplification. Our amplification algorithm also has practical empirical
utility: we show it leads to significant improvement in the privacy-utility
trade-offs for DP-FTRL algorithms on standard benchmarks
Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning
We introduce new differentially private (DP) mechanisms for gradient-based
machine learning (ML) with multiple passes (epochs) over a dataset,
substantially improving the achievable privacy-utility-computation tradeoffs.
We formalize the problem of DP mechanisms for adaptive streams with multiple
participations and introduce a non-trivial extension of online matrix
factorization DP mechanisms to our setting. This includes establishing the
necessary theory for sensitivity calculations and efficient computation of
optimal matrices. For some applications like SGD steps, applying
these optimal techniques becomes computationally expensive. We thus design an
efficient Fourier-transform-based mechanism with only a minor utility loss.
Extensive empirical evaluation on both example-level DP for image
classification and user-level DP for language modeling demonstrate substantial
improvements over all previous methods, including the widely-used DP-SGD .
Though our primary application is to ML, our main DP results are applicable to
arbitrary linear queries and hence may have much broader applicability.Comment: 9 pages main-text, 3 figures. 40 pages with 13 figures tota
A multi-label, dual-output deep neural network for automated bug triaging
Bug tracking enables the monitoring and resolution of issues and bugs within
organizations. Bug triaging, or assigning bugs to the owner(s) who will resolve
them, is a critical component of this process because there are many incorrect
assignments that waste developer time and reduce bug resolution throughput. In
this work, we explore the use of a novel two-output deep neural network
architecture (Dual DNN) for triaging a bug to both an individual team and
developer, simultaneously. Dual DNN leverages this simultaneous prediction by
exploiting its own guess of the team classes to aid in developer assignment. A
multi-label classification approach is used for each of the two outputs to
learn from all interim owners, not just the last one who closed the bug. We
make use of a heuristic combination of the interim owners
(owner-importance-weighted labeling) which is converted into a probability mass
function (pmf). We employ a two-stage learning scheme, whereby the team portion
of the model is trained first and then held static to train the team--developer
and bug--developer relationships. The scheme employed to encode the
team--developer relationships is based on an organizational chart (org chart),
which renders the model robust to organizational changes as it can adapt to
role changes within an organization. There is an observed average lift (with
respect to both team and developer assignment) of 13%-points in 11-fold
incremental-learning cross-validation (IL-CV) accuracy for Dual DNN utilizing
owner-weighted labels compared with the traditional multi-class classification
approach. Furthermore, Dual DNN with owner-weighted labels achieves average
11-fold IL-CV accuracies of 76% (team assignment) and 55% (developer
assignment), outperforming reference models by 14%- and 25%-points,
respectively, on a proprietary dataset with 236,865 entries.Comment: 8 pages, 2 figures, 9 table
Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
Differentially private learning algorithms inject noise into the learning
process. While the most common private learning algorithm, DP-SGD, adds
independent Gaussian noise in each iteration, recent work on matrix
factorization mechanisms has shown empirically that introducing correlations in
the noise can greatly improve their utility. We characterize the asymptotic
learning utility for any choice of the correlation function, giving precise
analytical bounds for linear regression and as the solution to a convex program
for general convex functions. We show, using these bounds, how correlated noise
provably improves upon vanilla DP-SGD as a function of problem parameters such
as the effective dimension and condition number. Moreover, our analytical
expression for the near-optimal correlation function circumvents the cubic
complexity of the semi-definite program used to optimize the noise correlation
matrix in previous work. We validate our theory with experiments on private
deep learning. Our work matches or outperforms prior work while being efficient
both in terms of compute and memory.Comment: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, and Krishna
Pillutla contributed equall
User Inference Attacks on Large Language Models
Fine-tuning is a common and effective method for tailoring large language
models (LLMs) to specialized tasks and applications. In this paper, we study
the privacy implications of fine-tuning LLMs on user data. To this end, we
define a realistic threat model, called user inference, wherein an attacker
infers whether or not a user's data was used for fine-tuning. We implement
attacks for this threat model that require only a small set of samples from a
user (possibly different from the samples used for training) and black-box
access to the fine-tuned LLM. We find that LLMs are susceptible to user
inference attacks across a variety of fine-tuning datasets, at times with near
perfect attack success rates. Further, we investigate which properties make
users vulnerable to user inference, finding that outlier users (i.e. those with
data distributions sufficiently different from other users) and users who
contribute large quantities of data are most susceptible to attack. Finally, we
explore several heuristics for mitigating privacy attacks. We find that
interventions in the training algorithm, such as batch or per-example gradient
clipping and early stopping fail to prevent user inference. However, limiting
the number of fine-tuning samples from a single user can reduce attack
effectiveness, albeit at the cost of reducing the total amount of fine-tuning
data
(Amplified) Banded Matrix Factorization: A unified approach to private training
Matrix factorization (MF) mechanisms for differential privacy (DP) have
substantially improved the state-of-the-art in privacy-utility-computation
tradeoffs for ML applications in a variety of scenarios, but in both the
centralized and federated settings there remain instances where either MF
cannot be easily applied, or other algorithms provide better tradeoffs
(typically, as becomes small). In this work, we show how MF can
subsume prior state-of-the-art algorithms in both federated and centralized
training settings, across all privacy budgets. The key technique throughout is
the construction of MF mechanisms with banded matrices (lower-triangular
matrices with at most nonzero bands including the main diagonal). For
cross-device federated learning (FL), this enables multiple-participations with
a relaxed device participation schema compatible with practical FL
infrastructure (as demonstrated by a production deployment). In the centralized
setting, we prove that banded matrices enjoy the same privacy amplification
results as the ubiquitous DP-SGD algorithm, but can provide strictly better
performance in most scenarios -- this lets us always at least match DP-SGD, and
often outperform it.Comment: 34 pages, 13 figure
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy
Studying data memorization in neural language models helps us understand the
risks (e.g., to privacy or copyright) associated with models regurgitating
training data and aids in the development of countermeasures. Many prior works
-- and some recently deployed defenses -- focus on "verbatim memorization",
defined as a model generation that exactly matches a substring from the
training set. We argue that verbatim memorization definitions are too
restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly
prevents all verbatim memorization. And yet, we demonstrate that this "perfect"
filter does not prevent the leakage of training data. Indeed, it is easily
circumvented by plausible and minimally modified "style-transfer" prompts --
and in some cases even the non-modified original prompts -- to extract
memorized information. We conclude by discussing potential alternative
definitions and why defining memorization is a difficult yet crucial open
question for neural language models